IMPROVING POINT AND INTERVAL ESTIMATES OF MONOTONE 
FUNCTIONS BY REARRANGEMENT 



VICTOR CHERNOZHUKOV 1 IVAN FERNANDEZ- VAL § ALFRED GALICHON* 



Abstract. Suppose that a target function is monotonic, namely, weakly increasing, and an 
available original estimate of this target function is not weakly increasing. Rearrangements, 
univariate and multivariate, transform the original estimate to a monotonic estimate that 
always lies closer in common metrics to the target function. Furthermore, suppose an original 
simultaneous confidence interval, which covers the target function with probability at least 
1 — a, is defined by an upper and lower end-point functions that are not weakly increasing. 
Then the rearranged confidence interval, defined by the rearranged upper and lower end-point 
functions, is shorter in length in common norms than the original interval and also covers the 
target function with probability at least 1 — a. We demonstrate the utility of the improved 
point and interval estimates with an age-height growth chart example. 
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1. Introduction 

A common problem in statistics is the estimation of an unknown monotonic function. Exam- 
ples of monotonic functions include biometric age-height charts, econometric demand functions, 
and quantile and distribution functions. If an original, potentially non-monotonic, estimate 
is available, then the rearrangement operation from variational analysis (Hardy, Littlewood, 
and Polya 1952, Lorentz 1953, Villani 2003) can be used to monotonize the original estimate. 
The rearrangement has been shown to be useful in producing monotonized estimates of den- 
sity functions (Fougeres 1997), conditional mean functions (Davydov and Zitikis 2005, Dette, 
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Neumeyer, and Pilz 2006, Dette and Scheder 2006), and various conditional quantile and dis- 
tribution functions, see, e.g., Chernozhukov, Fernandez- Val, and Galichon (2006) and the MIT 
working paper "Quantile and Probability Curves without Crossing" by the authors. 

In this paper, we use Lorentz inequalities and their appropriate generalizations to show that 
the rearrangement of the original estimate is not only useful for producing monotonicity, but 
also always improves upon the original estimate, whenever the latter is not monotonic. Thus, 
the rearranged curves are always closer to the target curve being estimated. Furthermore, 
this improvement property does not depend on the nature of the original estimate and applies 
to both univariate and multivariate cases. The improvement property of the rearrangement 
also extends to the construction of confidence bands for monotone functions. We show that 
we can increase the coverage probabilities and reduce the lengths of the confidence bands for 
monotone functions by rearranging their upper and lower bounds. 

Monotonization has a long history in the statistical literature, mostly in relation to isotone 
regression. We will not provide an extensive literature review, but reference a few other 
methods most related to the rearrangement. Mammen (1991) studies two-step estimators, 
including one with smoothing in the first step and monotonization by isotone regression in 
the second. Mammen, Marron, Turlach, and Wand (2001) show that this and many related 
procedures can be recast as projections with respect to a given norm. Another approach is the 
one-step procedure of Ramsay (1988), which projects on a class of monotone spline functions 
called I-splines. Later in the paper we will compare and combine these procedures with the 
rearrangement . 



2. Improving Point Estimates of Monotone Functions by Rearrangement 

2.1. Formulation of the problem. A basic problem in many areas of statistics is the esti- 
mation of an unknown target function fo : M. d — > R. Suppose we know that fo is monotonic, 
namely weakly increasing, and an original estimate / is available, which is not necessarily 
monotonic, but is theoretically attractive and computationally tractable otherwise. Many 
common estimation methods do indeed produce such estimates. Can they always be improved 
with no harm? The answer is yes: the rearrangement method transforms the original estimate 
to a monotonic estimate /*, and this estimate is closer in common metrics to the true curve 
fo than the original estimate /. Furthermore, the rearrangement is computationally tractable, 
and thus preserves the appeal of the original estimates. 

Estimation methods used in regression analysis can be grouped into global methods and 
local methods. An example of a global method is the series estimator of fo taking the form 
f{x) = Pk„(x)'b, where Pk n (x) is a k n - vector of suitable transformations of the variable x, such 



3 



as B-splines, polynomials, and trigonometric functions, and 

n 

b = arg mm - 
i=l 

where {(Yi,Xi),i = 1, . . . ,n} denotes the data. In particular, using the square loss p{u) = u 2 
produces estimates of the conditional mean of Yi given X\ (Gallant 1981, Andrews 1991, Stone 
1994, Newey 1997), while using the asymmetric absolute deviation loss p(u) = {u — l(u < 0)}u 
produces estimates of the conditional u-quantile of Yi given Xi (Koenker and Bassett 1978, 
Portnoy 1997, He and Shao 2000). The series estimates x i-> f(x) = Pk n {x)'b are widely 
used in data analysis due to their desirable approximation and theoretical properties, and 
computational tractability. However, they need not be monotone, unless explicit constraints 
are added (Matzkin 1994, Silvapulle and Sen 2005, Koenker and Ng 2005). 

Examples of local methods include kernel and local polynomial estimators. A kernel esti- 
mator takes the form 

n 

f{x) = argmin V] Wip(Yi - 6), Wi = K 
i=i 

where the loss function p plays the same role as above, K(u) is a multivariate kernel function, 
and h > is a vector of bandwidths (Wand and Jones 1995, Ramsay and Silverman 2005). The 
resulting estimate x i— > f(x) need not be monotone. Dette, Neumeyer, and Pilz (2006) show 
that the rearrangement transforms the kernel estimate into a monotonic one. We further show 
here that the rearranged estimate necessarily improves upon the original estimate, whenever 
the latter is not monotonic. Local polynomial regression is a related local method (Chaudhuri 
1991, Fan and Gijbels 1996). In particular, the local linear estimator takes the form 

n 

{f(x),d(x)} = argmin ^ Wip{Yi - b - c'(X, - x)} 2 , Wi = K 

&eR,c€R d i=1 

The resulting estimate x i— > f(x), while theoretically attractive and computationally tractable, 
may also be non-monotonic, as illustrated in Section 4. 

2.2. The rearrangement and its estimation property: the univariate case. In what 
follows, let X be a compact interval; without loss of generality we take X = [0, 1]. Let / be a 
measurable function mapping X to K, a bounded subset of R. The increasing rearrangement 
/* of / is the quantile function of the random variable f(X) when X ~ U(0, 1), that is, 

f*(x) = inf jy g R : J l{f(u) < y}du > x 

The rearrangement operator simply transforms a function / to its quantile function /*. For 
computing purposes when / is continuous, we can think of the rearrangement as a sorting 
operation: given values of the function / evaluated at x in a fine enough net of equidistant 
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points, we simply sort the values in increasing order to create the sorted, i.e., rearranged, 
function. 

Proposition 1. Let the target /o : X — > K be a weakly increasing measurable function in x, 
and f : X — > K be another measurable function, an initial estimate of /o. 

1. For any p € [l,oo], the rearrangement of f, denoted f* , weakly reduces the estimation error: 



x 



t{x)-h{x) 



dx 



1/p 



< 



x 



f(x) - f (x] 



dx 



l/p 



(2.1) 



2. Suppose that there exist regions Xq and Xq, each of measure greater than 5 > 0, such that for 
allx £ Xq andx' £ Xq we have that (i) x' > x, (ii) f(x) > f[x')+e, and (Hi) fo(x') > fo(x)+e, 
for some e > 0. Then the gain in the quality of estimation is strict for p € (l,oo). Namely, 
for any p £ (1, oo), 



A' 



r(x)-Mx] 



dx 



VP 



< 



fix) - f (x] 



dx — 5rjr, 



l/p 



(2.2) 



where r] p = inf{\v — t'\ p + \v' — t\ p — \v — t\ p — \v' — t'\ p } > 0, with the infimum taken over all 
v, v ', t, t! in the set K such that v' > v + e and t' > t + e. 

Proposition 1 establishes that the rearranged estimate /* has a smaller, often strictly smaller, 
estimation error in the L p norm than the original estimate whenever the latter is not monotone. 
This very useful and generally applicable property is independent of the sample size and of 
the way the original estimate / is obtained. As follows from (12. 2p . the reduction in estimation 
error is strict for LP norms with p G (1, oo) if the original estimate / is decreasing on a subset 
of X having positive measure, while the target function /o is increasing on this subset. If /o is 
constant, then there is no reduction in estimation error; that is, the inequality (|2.ip becomes 
an equality, since the random variables f*(X) and f{X) share the same quantile function /* 
and hence the same distribution function, and fo(X) is constant. 

The weak inequality (|2.ip is a direct, yet important, consequence of the classical rearrange- 
ment inequality due to Lorentz (1953): let q and g be two functions mapping X to K, and 
q* and g* be their corresponding increasing rearrangements, then J x L{q*(x), g*(x)}dx < 
f x L{q(x), g(x)}dx, for any submodular discrepancy function L : M 2 i— > R + . We set q = /, 
Q* = /*) 9 = /o; and 9* = fo- In our case /q = /o almost everywhere, that is, the target 
function is its own rearrangement. Further, recall that L is submodular if for each pair of 
vectors (v,t) and [v',t') in R 2 , we have that 

L(v A v',t A t') + L[v Vv',tV if) < L(v, t) + L(y', t'). (2.3) 

In other words, a function L measuring the discrepancy between pairs of vectors is submodular 
if co-monotonization of the pair reduces the discrepancy. When the function L is smooth, 
submodularity is equivalent to d 2 L(v ,t)/(dvdt) < holding for each (v,t) in R 2 . Thus, for 
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example, power functions L(v,t) = \v — t\ p for p € [1, oo) and many other loss functions are 
submodular. The weak inequality (|2.1|) then follows. 

2.3. The rearrangement and its estimation property: the multivariate case. In this 
section we consider multivariate functions / : X d —* K, where X d = [0, l] d and K is a bounded 
subset of R. The notion of monotonicity we seek to impose on / is the following: we say 
that the function / is weakly increasing in the vector x if f(x') < f(x) whenever x' < x 
(componentwise). In what follows, we use f(xj,X-j) to denote the dependence of / on Xj, and 
all other arguments, X-j, that exclude xj. The notion of monotonicity above is equivalent to 
the requirement that for each j in 1, . . . ,d the mapping xj \—> f(xj,x-j) is weakly increasing 
in Xj, for each x-j in X d ~ x . 

Define the rearrangement operator Rj and the rearranged function f* with respect to Xj as 



f;(x)=R J f(x)=mf\y 



l{f(x'j,x-j) < yjdx'j 
x 



> Xj 



This is the one-dimensional increasing rearrangement applied to the one-dimensional function 
Xj i — > f(xj,x_j), holding the other arguments X-j fixed. The rearrangement is applied for 
every value of the other arguments x-j. 

Let 7r = ("7Ti, . . . , TTd) be an ordering, i.e., a permutation, of the integers 1, . . . ,d. Let us 
define the 7r-rearrangement operator R n and the 7r-rearranged function /* as /* = R n f = 
R 7Tl . . . Rn d f- For any ordering n, the 7r-rearrangement operator rearranges the function with 
respect to all of its arguments. As shown below, the resulting function f v is weakly increasing in 
x. In general, two different orderings ir and it' of 1, . . . , d can yield different rearranged functions 
/* and /*/. To resolve the conflict among rearrangements done with different orderings, we 
may consider averaging among them: letting IT be any finite collection of orderings tt, we can 
define the average rearrangement as 

J ~ Irrl 2. j •' 7r ' 
1 1 ?ren 

where |II| denotes the number of elements in the set of orderings IT. Dette and Scheder (2006) 
also proposed averaging all the possible orderings of a related smoothed procedure in the 
context of monotone conditional mean estimation. As shown below, the estimation error of 
the average rearrangement is weakly smaller than the average of estimation errors of individual 
7r-rearrangements. 

The following proposition describes the properties of multivariate 7r-rearrangements: 

Proposition 2. Let the target function fo : X d — > K be weakly increasing and measurable in 
x. Let f : X d — > K be a measurable function that is an initial estimate of fo. Let f : X d — > K 
be another estimate of fo, which is measurable in x, including, for example, a rearranged f 
with respect to some of the arguments. 
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1. For each ordering ir ofl,...,d, the tt -rearranged estimate f* is weakly increasing. More- 
over, f* , an average of tt -rearranged estimates, is weakly increasing. 

2. (a) For any j in 1, ... ,d and any p in [1, oo], the rearrangement of f with respect to the 
j-th argument produces a weak reduction in the estimation error: 

\f*(x) - h{x)\Vdx} l ' P < \f(x) - h(x)?dx} l ' P . 

(b) A tt -rearranged estimate f* of f weakly reduces the estimation error of f: 

U \f:(x) - /o(x)|^x} 1/P < U \f(x) - h(x)\Hx^' P . (2.4) 

3. Suppose that there exist subsets Xj C X and X'- C X , each of measure greater than 5 > 0, 
and a subset X-j C X d ~ x , of measure v > 0, such that for all x = (xj,X-j) and x' = (x'j,X-j), 
with x'j G X'-, Xj G Xj, x-j G X-j, we have that (i) x'j > Xj, (ii) f(x) > f(x') + e, and (Hi) 
fo(x') > fo(x) + e, for some e > 0. 

(a) Then, for any p G (l,oo) ; 

{j xd ^ {X) ~ fo( x )\ Pdx } 1/P < [J xd l/» " M*)\ P dx - r/ p ^} 1/P , 

where rj p = inf{\v — t'\ p + \v' — t\ p — \v — t\ p — \v' — t'\ p } > 0, with the infimum taken over all 
v, v ' , t, t' in the set K such that v' > v + e and t' >t + e. 

(b) Further, for an ordering it = (tt\, . . . , tt^, . . . , 71^) with ir^ = j, let f be a partially 
rearranged function, f = Rjr k+1 ■ ■ ■ Rn d f (for k = d we set f = f ). If the function f and the 
target function fo satisfy the condition stated above, then, for any p G (l,oo), 

J xd \ft{x)-Mx)\>dxj < ^J xd \f(x)-Ux)\ p dx- Vp 5u^ . (2.5) 

4- The estimation error of an average rearrangement is weakly smaller than the average 
estimation error of the individual tt- rearrangements: for any p G [1, oo], 

J xd \f*(x) ~ h{x)\ p dx} llP <^E{/^ \ti(x) - Hx)\ p dx^' P . 

Proposition 2 generalizes Proposition 1 to the multivariate case, also demonstrating several 
features unique to the multivariate case. We see that the 7r-rearranged functions are monotonic 
in all of the arguments. Dette and Scheder (2006), using a different argument, showed that their 
related smoothed procedure for conditional mean functions is monotonic in both arguments 
for the bivariate case in large samples. The rearrangement along any argument improves 
the estimation properties. Moreover, the improvement is strict when the rearrangement with 
respect to a j-th argument is performed on an estimate that is decreasing in the j-th argument, 



Figure 1. Geometric illustration for the proof of Proposition 1 (left panel) 
and comparison to isotonic regression (right panel). The solid dark line is the 
target function /o, the dotted line is the original estimate /, the dashed line is 
the rearranged estimate /*, the dotted-dashed line is the isotonized estimate f 1 , 
and the solid light line is the average of the rearranged and isotonized estimates 
f 1 / 2 . In the left panel L(v, t) = a p , L(v' , t) = tp, L(v', t') = ¥>, and L(v, t 1 ) = <P. 

while the target function is increasing in the same j-th argument, in the sense precisely defined 
in the proposition. Averaging different 7r-rearrangements is better on average than using a 
single 7r-rearrangement chosen at random. 

2.4. Discussion. Here we informally explain why rearrangement provides the improvement 
property and compare rearrangement to isotonization. 

We begin by noting that the proof of the improvement property can be first reduced to 
the case of step functions or, equivalently, functions with a finite domain, and then to the 
case of functions with a two-point domain. The improvement property for such functions then 
follows from the submodularity property (|2.3p . In the left panel of Figure [1] we illustrate 
this geometrically by plotting the original estimate /, the rearranged estimate /*, and the 
true function fa. In this example, the original estimate is decreasing and hence violates the 
monotonicity requirement. We see that the two-point rearrangement co-monotonizes /* with 
/o and thus brings /* closer to /o- Also, we can view the rearrangement as a projection on the 
set of weakly increasing functions that have the same distribution as the original estimate /. 
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In the right panel of Fig. [T] we plot both the rearranged and isotonized estimates. The 
isotonized estimate f 1 is a projection of the original estimate / on the set of weakly increasing 
functions, that only preserves the mean of the original estimate. We can compute the two 
values of the isotonized estimate f 1 by assigning to both the average of the two values of the 
original estimate /, whenever the latter violate the monotonicity requirement, and leaving the 
original values unchanged otherwise. In our example in Fig. Q] this produces a flat function 
f 1 . This pool adjacent violators procedure extends to domains with more than two points 
by applying the procedure iteratively to any pair of points at which monotonicity is violated 
(Ayer, Brunk, Ewing, Reid, and Silverman 1955). 

Using the computational definition of isotonization, one can show that, like rearrangement, 
isotonization also improves upon the original estimate, for any p E [1, oo]: 

{L _ h{x)iPdx } 1/P - {S x ■ f °^ Pdx } 1/p ' 

see, e.g., Barlow, Bartholomew, Bremner, and Brunk (1972). Therefore, it follows that any 
function f x in the convex hull of the rearranged and isotonized estimate both (1) monotonizes 
and (2) improves upon the original estimate /, that is, for any p E [1, oo] and A E [0, 1], 

> i/p r r -\ Vp 

\f x (x)-f Q (x)\Pdxj < |y |/(z)-/ (s)|*dsj , 

where f x = A/* + (1 — Xjf 1 . The first property is obvious and the second follows from 
homogeneity and subadditivity of norms. By induction on the dimension, the improvement 
property extends to the sequential multivariate isotonization and to its convex hull with the 
sequential multivariate rearrangement. 

Thus, we see that a rather rich class of procedures both monotonizes the original estimate 
and reduces the distance to the true target function. However, there is no single best distance- 
reducing monotonizing procedure. Indeed, whether the rearranged estimate /* approximates 
the target function better than the isotonized estimate f 1 depends on how steep or flat the 
target function is. We illustrate this point using the example plotted in the right panel of Fig. 
[TJ consider any increasing target function taking values in the shaded area between /* and 
f 1 , and also the function J 1 ' 2 , the average of the isotonized and the rearranged estimate, that 
passes through the middle of the shaded area. Suppose first that the target function is steeper 
than Z 1 ' 2 , then /* has a smaller estimation error than f 1 . Now suppose instead that the target 
function is flatter than f 1 ^ 2 , then f 1 has a smaller estimation error than /*. It is also clear 
that, if the target function is neither very steep nor very flat, / 1//2 can outperform either /* 
or / . Thus, in practice we can choose rearrangement, isotonization, or, some combination of 
the two, depending on our beliefs about how steep or flat the target function is in a particular 
application. 



9 



3. Improving Interval Estimates of Monotone Functions by Rearrangement 

In this section we propose to directly apply the rearrangement, univariate and multivariate, 
to simultaneous confidence intervals for monotone functions. We show that our proposal will 
necessarily improve the original intervals by decreasing their length while retaining the same 
or greater coverage level. 

Suppose that we are given an initial simultaneous confidence interval 

[£,u] = {[£(x),u(x)],x eX d }, (3.1) 

where £ and u are the lower and upper end-point functions such that £ < u on X d , that is, 
£{x) < u(x) for all x € X d . We further suppose that the confidence interval [£, u] has either the 
exact or the asymptotic confidence property for the estimand function /, namely, for a given 
a €(0,1), 

pv P {f e[£,u]}>l-a, (3.2) 

for all probability measures P in some set V n containing the true probability measure Pq. 
The statement / € [£,u] means that £{x) < f(x) < u(x) for all x E X d . We assume that 
property f|3.2j) holds either in the finite sample sense, that is, for the given sample size n, 
or in the asymptotic sense, that is, for all but finitely many sample sizes n (Lehmann and 
Romano 2005). 

A common confidence interval for functions specifies 

£(x) = f(x) — cs(x), u{x) = f(x) + cs(x), (3-3) 

where f(x) is a point estimate, s(x) is the standard error of the point estimate, and c is a critical 
value chosen to attain the confidence property (|3.2[) . Wasserman (2006) provides an excellent 
overview of methods for constructing the critical value. The problem with such confidence 
intervals, as with the point estimates themselves, is that they need not be monotonic. Indeed, 
the end-point functions (|3.3p need not be monotonic, so the confidence interval may contain 
non- monotone functions excludable from it. Accordingly we can intersect the interval with the 
set of monotone functions to reduce its length without affecting its coverage level. In some 
cases, however, the initial interval may not contain any monotone function and the resulting 
intersected interval is empty, due, for example, to misspecification. 

We say that confidence intervals are misspecified or incorrectly centered if the estimand /, 
being covered by [£,u] in (|3.2p . is not equal to the weakly increasing target function /o, so 
that / may not be monotone. Incorrect centering is rather common both in parametric and 
non-parametric estimation. In parametric estimation correct centering of confidence intervals 
requires perfect specification of functional forms, whereas in nonparametric estimation correct 
centering requires the so-called undersmoothing; both are difficult. In real applications with 
many regressors, researchers tend to use oversmoothing rather than undersmoothing. In a 
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recent development, Genovese and Wasserman (2008) provide some formal justification for 
oversmoothing: targeting inference on functions /, that represent various smoothed versions 
of /o and thus summarize features of fo, may be desirable to make inference more robust, 
or, equivalently, to enlarge the class of data-generating processes V n for which (|3.2I) holds. 
Regardless of the reasons for why the confidence intervals may target / instead of fo, our 
procedures will work for inference on the monotonized, hence improved, version /* of /. 

Our proposal for improved interval estimates is to rearrange the entire simultaneous confi- 
dence interval into a monotonic interval 



[£*,u*} = {[t{x),u*(x)],x e x d }, 



(3.4) 



where the lower and upper end-point functions £* and u* are the increasing rearrangements 
of the original end-point functions £ and u. In the multivariate case, we use the symbols 
£* and u* to denote either multivariate 7r-rearrangements £% and u* or average multivariate 
rearrangements £* and u*, whenever we do not need to emphasize specifically the dependence 
on 7T. 

The following proposition describes the properties of the rearranged confidence intervals. 

Proposition 3. Let [£, u] in be the original confidence interval satisfying the property 

h3. 2\) for the estimand function f : X d i— > K and let the rearranged confidence interval \£*,u*\ 



be defined as in \3.4\/ - 

1. The interval [l*,u*\ is weakly increasing and non-empty, in the sense that the end-point 
functions £* and u* are weakly increasing on X d and satisfy £* < u* on X . Moreover, 
the event that f G [£,u] implies the event that f* G [£*,u*]. In particular, under the correct 
specification, when f equals a weakly increasing target function fo, we have that f = f* = fo, 
so that fo G [£,u] implies fo G [.£*,«*]. Therefore, [£*,u*] covers f* , which is equal to fo under 
the correct specification, with a probability that is greater or equal to the probability that [£, u] 
covers f . 

2. The interval [£*,u*] is weakly shorter than [£,u] in the IP length: for each p G [1, oo], 



x d 



£*{x)-u*(x) 



dx 



< 



x d 



£{x) — u(x) 



dx 



Vp 



(3.5) 



3. In the univariate case, suppose that there exist subsets Xq C X and Xq C X, each of 
measure greater than 5 > such that for all x' € Xq and x G Xq, we have that x' > x, and 
either (i) £{x) > £{x') + e, and u(x') > u(x) + e, for some e > or (ii) £{x') > £{x) + e and 
u{x) > u(x') + e, for some e > 0. Then, for any p G (1, oo), 



x 



£*{x)-u*(x) 



dx 



i/p 



< 



x 



£(x) 



U(X, 



r) p 5 



i/p 
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where rj p = inf{|t> — t'\ p + \v' — t\ p — \v — t\ p — \v' — t'\ p } > 0, where the infimum is taken over 
all v, v ', t, t' in K such that v' > v + e and t' > t + e. 

In the multivariate case with d > 2, for an ordering ir = (7Q, . ,7Tjfe, ... , 7Trf) o/ integers 
{1, ... 3 d} with TTk = j, let g denote the partially rearranged function, g = R nk+1 . . . R^^, 
where for k = d we set g = g. Suppose there exist subsets Xj C X and Afj C X, each of 
measure greater than 5 > 0, and a subset X^j C X d ~ l , of measure v > 0, such that for all 
x = (xj,x-j) and x' = {x'^X-j), with x'j € X'-, xj € Xj, X-j £ X-j, we have that (i) x'- > Xj, 
and either (ii) £(x) > l(x') + e, and u(x') > u(x) + e, for some e > or (Hi) t(x') > l{x) + e 
and u(x) > u(x') + e, for some e > 0. Then, for any p G (1, oo) and rj p > defined as above 



x d 



i/p 

ll{x) -ul{x) r dx\ <{\ t{x)-u{x) 



p 

- r) p 5v 



Vp 



Proposition 3 shows that the rearranged confidence intervals are weakly shorter than the 
original confidence intervals, and also qualifies when the rearranged confidence intervals are 
strictly shorter. In particular, in the univariate case the inequality (13. 5h is necessarily strict for 
p G (1, oo) if there is a region of positive measure in X over which the end-point functions £ and 
u are not comonotonic. This weak shortening result follows for univariate cases directly from 
the Lorentz (1953) inequality, and the strong shortening by its strengthening. The shortening 
results for the multivariate case follow by induction on the dimension. Moreover, the order- 
preservation property of the univariate and multivariate rearrangements, demonstrated in the 
proof, implies that the rearranged confidence interval [£*, u*] has a weakly higher coverage than 
the original confidence interval [£, it]. We do not quantify strict improvements in coverage, but 
demonstrate them through the examples in the next section. 

Our idea of directly monotonizing the interval estimates also applies to other monotoniza- 
tion procedures. Indeed, the proof of Proposition 3 reveals that part 1 applies to any order- 
preserving monotonization operator T, such that 

g < m implies Tg < Tm. (3.6) 

Furthermore, part 2 of Proposition 3 on the weak shortening of the confidence intervals applies 
to any distance-reducing operator T such that 

^1 i/p r r ~\ Vp 

\T£{x) -Tu{x)\ p dx\ <{ \£{x) -u{x)\ p dx\ . (3.7) 
x d J {.Jx* J 

Rearrangements are instances of operators that have properties (j3.6|) and (|3.7p . Isotoniza- 
tion is another important instance (Robertson, Wright, and Dykstra 1988). Moreover, convex 
combinations of order-preserving and distance-reducing operators, such as the average of re- 
arrangement and isotonization, also have properties (|3.6p and f|3.T[) . 
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4. Illustrations 

4.1. An empirical illustration with age- height reference charts. In this section we pro- 
vide an empirical application to biometric age-height charts. We show how the rearrangement 
monotonizes and improves various nonparametric point and interval estimates for functions. 

Since their introduction by Quetelet in the 19th century, reference growth charts have be- 
come common tools to assess an individual's health status. These charts describe the evolution 
of individual anthropometric measures, such as height, weight, and body mass index, across 
different ages. See Cole (1988) for a classical work on the subject, and Wei, Pere, Koenker, 
and He (2006) for a recent analysis from a quantile regression perspective and additional refer- 
ences. Here we consider an application of the rearrangement and other related methods to the 
estimation of growth charts for height. This makes sense since an individual's height should 
follow an increasing relationship with age up to adulthood. Our data consist of repeated cross 
sectional measurements of height in centimeters and age in months from the 2003-2004 US Na- 
tional Health and Nutrition Survey, and is further restricted to the subsample of US-born white 
males aged 2-20 to avoid other confounding factors, giving us a sample of 533 observations. 

Let Y and X denote height and age, respectively. Let E[Y \ X = x] denote the conditional 
expectation of Y given X = x, and Qy [u \ X = x] denote the conditional it-th quantile of Y 
given X = x, where u is the quantile index. The target functions of interests are the conditional 
expectation function, x \— > E\Y | X = x], the conditional quantile functions for several quantile 
indices, x i— > Qy[u \ X = x], for u = 5%, 50%, and 95%, and the entire conditional quantile 
process for height given age, (it, a:) i— ► Qy[u \ X = x]. The monotonicity requirements for these 
target functions are the following: the first two should be increasing in age x, and the third 
should be increasing in both age x and the quantile index u. 

We estimate the target functions using non-parametric ordinary least squares or quantile 
regression and then rearrange the estimates to satisfy the monotonicity requirements. We 
consider kernel, local linear, regression splines, and Fourier series methods. For the kernel and 
local linear methods, we choose a bandwidth of one year and a box kernel. For the regression 
splines method, we use cubic B-splines with a knot sequence {3, 5, 8, 10, 11.5, 13, 14.5, 16, 18} 
(Wei, Pere, Koenker, and He 2006). For the Fourier method, we employ four sines and four 
cosines. For the estimation of the conditional quantile process, we use {0.005, 0.010, . . . , 0.995} 
as a net of quantile indices. 

Figure [2] shows the original and rearranged estimates of the conditional quantile functions 
for the different methods. All the estimated curves have trouble capturing the slowdown in 
the growth of height after age fifteen and yield non-monotonic curves for the highest values 
of age. The Fourier series performs particularly poorly in approximating the aperiodic age- 
height relationship and has many non-monotonicities. The rearrangement delivers curves that 
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Kernel (h = 1) Local linear (h = 1) 




5 10 15 20 5 10 15 20 

Age (years) Age (years) 



Regression B-Splines Fourier series 




Age (years) Age (years) 

Figure 2. Estimates of the 5%, 50%, and 95% conditional quantile functions 
of height given age and their increasing rearrangements, obtained by kernel, 
local linear, cubic B-splines series, and Fourier series regression. Light thick 
lines are the original estimates and dark thin lines are the rearranged estimates. 



improve upon the original estimates and that satisfy the natural monotonicity requirement. 
We quantify this improvement in the next subsection. 
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Figured] (a, b) illustrates the multivariate rearrangement of the conditional quantile process 
along both the age and the quantile index arguments. We plot, in three dimensions, the origi- 
nal estimate and its average multivariate rearrangement (the average of the age-quantile and 
quantile-age rearrangements). We focus on the Fourier series estimates, which have the most 
severe non-monotonicity problems. Analogous figures for the other estimation methods are 
given in an MIT working paper containing an extended version of this article. We see that the 
estimated quantile process is non-monotone in age and in the quantile index at extremal val- 
ues of this index. The average multivariate rearrangement fixes the non-monotonicity problem 
delivering an estimate of the quantile process that is monotone in both the age and the quan- 
tile index. Furthermore, by the theoretical results of the paper, the multivariate rearranged 
estimates necessarily improve upon the original estimates. 

In Figures [3] and S] (c,d), we plot original and rearranged 90% simultaneous confidence 
intervals. Fig. [3] shows the intervals for the conditional expectation function and for the 
conditional 5%, 50%, and 95% quantile functions, based on Fourier series estimates. We obtain 
the original intervals of the form f|3.3|) using the bootstrap with 200 repetitions to estimate 
the standard errors and critical values (Hall 1993). We then obtain the rearranged confidence 
intervals by rearranging the lower and upper end-point functions of the initial confidence 
intervals, following Section 3. In Fig. H] (c,d), we plot the original and the rearranged 90% 
simultaneous confidence intervals for the entire conditional quantile process, based on the 
Fourier series estimates. The rearranged confidence intervals correct the non-monotonicity of 
the original confidence intervals and reduce their integrated L p length. 

4.2. Monte-Carlo illustration. In the following Monte Carlo experiment we quantify the 
improvement in the point and interval estimation that rearrangement can provide relative to 
the original estimates. We also compare it to isotonization and to its convex combinations 
with isotonization. Our experiment uses a model, described in detail in the Appendix, that 
mimics the empirical application very closely. This model implies a true conditional expectation 
function and quantile process that are monotone in age and in the quantile index. 

In Table [1] we report the average L p errors, for p = 1,2, and oo, for the original estimates of 
the conditional expectation function. We also report the relative efficiency of the rearranged 
estimates, measured as the ratio of the average error of the rearranged estimate to the average 
error of the original estimate; together with relative efficiencies for alternative approaches based 
on isotonization of the original estimates (Mammen 1991) and on averaging the rearranged and 
isotonized estimates. For regression splines, we also consider the one-step monotone regression 
splines (Ramsay 1998). 

For all of the methods and norms considered, the rearranged curves estimate the target 
function more accurately than the original curves. There is no uniform winner between re- 
arrangement, isotonization, and the average of the two, which is consistent with the analysis 



CEF 



CQF: 50% 





Age (years) 



Age (years) 



CQF: 5% 



CQF: 95% 





Age (years) 



Age (years) 



Figure 3. 90% confidence intervals for conditional expectation function 
(CEF), and 5%, 50% and 95% conditional quantile functions (CQF) of height 
given age and their increasing rearrangements. Estimates are based on Fourier 
series and confidence bands are obtained by bootstrap with 200 repetitions. 
Dark bands are the original confidence intervals and light bands are the rear- 
ranged confidence intervals. 
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Figure 4. Fourier series point and interval estimates of the conditional quantile 
process of height given age and their increasing rearrangements. Panels (a) and 

(b) plot original estimate and its average multivariate rearrangement. Panels 

(c) and (d) plot original and rearranged 90% confidence intervals. Original 
confidence interval obtained by bootstrap with 200 repetitions. 

of Section 2.4. For example, the rearrangement outperforms the other methods for kernel, 
local linear and splines, but performs worse than the average for Fourier in some norms. In 
numerical results not reported, we find that rearrangement performs worse than isotonization 
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Table 1. LP Estimation Errors of Original, Rearranged, Isotonized, Average 
Rearranged-Isotonized, and Monotone Estimates of the Conditional Expecta- 
tion Function, for p = 1,2, oo. 
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1-09 
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0-30 




3-68 


0-85 


0-88 0-86 


0-84 


48-9 


0-16 


0-34 


0-20 



Lq, L r ,L p , L^ R+I y 2 , and L V M are average L p errors of the original, rearranged, isotonized, the average 
rearranged-isotonized, and monotone regression splines estimates; they are computed as the Monte Carlo 
average of {f x \f(x) — fo(x)\ p dx} 1 ^ p , where /o is the target and / an estimate. 



for global polynomials. This and other methods are available in the MIT working paper. For 
regression splines, the performance of the rearrangement is comparable to the computationally 
more intensive one-step monotone splines procedure. 

In Table [2] we report the average LP errors for the original estimates of the conditional 
quantile process. We also report the ratio of the average error of the multivariate rearranged 
estimate, with respect to the age and quantile index arguments, to the average error of the orig- 
inal estimate; together with the same ratios for isotonized and average rearranged-isotonized 
estimates. We obtain the multivariate isotonized estimates by sequentially applying the uni- 
variate isotonization to each argument, and then averaging for the two possible orderings 
age-quantile and quantile-age. For all the methods and norms considered, the multivariate re- 
arranged curves estimate the target function more accurately than the original curves. There 
is again no uniform winner between rearrangement, isotonization, and their average. 

Table [3] reports Monte Carlo coverage frequencies and integrated lengths for the original and 
monotonized 90% confidence bands for the conditional expectation function. For a measure 
of length, we used the integrated LP length, as defined in Proposition 3, with p = 1,2, and 
oo. We construct the original confidence intervals of the form specified in equations (|3.3[) by 
obtaining the pointwise standard errors of the original estimates using the bootstrap with 200 
repetitions, and calibrate the critical value so that the original confidence bands cover the 
entire true function with the exact frequency of 90%. We construct monotonized confidence 
intervals by applying rearrangement, isotonization, and a rearrangement-isotonization average 
to the end-point functions of the original confidence intervals, as proposed in Section 3. In 
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Table 2. L p Estimation Errors of Original, Rearranged, Isotonized, and Aver- 
age Rearranged-Isotonized Estimates of the Conditional Quantile Process, for 
p = 1, 2, and oo. 
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/L p 



L P Q , L P R ,L^, and L P R+I y 2 are the average LP errors of the original, multivariate rearranged, multivariate 
isotonized, the multivariate average rearranged-isotonized estimates; they are computed as the Monte Carlo 
averages of {J u J x \f(u,x) — fo(u, x)\ v dxdu} 1 / p , where /o is the target and / an estimate. 

all cases the rearrangement and other monotonization methods increase the coverage of the 
confidence intervals while reducing their length. In particular, we see that monotonization 
increases coverage especially for the local estimation methods, whereas it reduces length most 
noticeably for the global estimation methods. For the most problematic Fourier estimates, 
there are large increases in coverage and reductions in length. 
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Appendix A. Proofs of Propositions 

Proof of Proposition 1. Proof of Part 1. This follows in part the strategy in Lorentz's (1953) 
proof. We assume first that the functions / and fo are step functions, constant on intervals 
((s — l)/r, s/r], s = 1, . . . ,r. For each step function / with r steps we associate an r-vector / 
whose s-th element, denoted f s , equals to the value of function / on the s-th interval, and vice 
versa. Let us define the sorting operator S acting on vectors (and functions) / as follows. Let 
k be an integer in 1, . . . , r such that fj, > f rn for some m > k. If k does not exist, set Sf = f. 
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Table 3. Coverage (%) and Integrated Lengths of Original, Rearranged, Iso- 
tonized, and Average Rearranged-Isotonized 90% Confidence Intervals for the 
Conditional Expectation Function. 
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O, R, I, and (R + I)/2 refer to original, rearranged, isotonized, and average rearranged-isotonized confidence 
intervals. Coverage probabilities (Cover) are for the entire function. 



If k exists, set Sf to be a r- vector with the k-th element equal to f m , the m-th element equal 
to fk, and all other elements equal to the corresponding elements of /. Finally, given a vector 
Sf there is a step function Sf associated to it, as stated above. 

For any submodular function L : M 2 — > M + , by fk > f m , fo m > /ofc and the definition of the 
submodularity, L(f m , f ok ) + L(f k , f 0m ) < L(f k , f ok ) + L(f m , f 0m ). A simple geometric illustra- 
tion for this property is given in Figured) Therefore, conclude that f x L{Sf(x), fo(x)}dx < 
j x L{f(x), fo(x)}dx, using that we integrate step functions. Applying the sorting operator a 
sufficient finite number of times to /, we obtain a completely sorted, that is, rearranged, vector 
/*. Thus, we can express /* as /* = S . . . Sf, where the operator S is applied finitely many 
times. By repeating the argument above, each application weakly reduces the estimation error. 
Therefore, 

/ L{f*(x),f (x)}dx< [ L{S...Sf(x),f (x)}dx< [ L{f(x)J (x)}dx. (A.l) 
J X Jx Jx 

Next we extend this result to general measurable functions / and /o mapping [0, 1] to 
K, where /o is a quantile function. Take a subsequence of bounded step functions and 
/g , with f^ being quantile functions, converging to / and /o almost everywhere as index 
q — > oo along an increasing sequence of integers. The almost everywhere convergence of /w 
to / implies the almost everywhere convergence of its quantile function to the quantile 
function of the limit, /* (van der Vaart (1998), p. 305). Since f| A. 1 1) holds for each q along the 
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subsequence, the dominated convergence theorem implies that (jA.lh also holds for the general 
case. 

It remains to show the existence of the subsequence in the preceding paragraph. Using 
series expansion in the Haar basis, any function in L 2 [0, 1] can be approximated in I? norm 
by a sequence of r-step functions, where r = 2 3 and j = 1, . . . , oo (Pollard (2002), p. 305) 
. Hence there is a subsequence of step functions and fjf^ converging to / and /o in L 2 
norm; the functions in the subsequence necessarily take values in K; by Pollard (2002), p. 38, 
we can extract a further subsequence and fjf^ , with q running over an increasing sequence 
of integers, converging to / and /o almost everywhere. Finally, replace by their quantile 
functions, i.e., rearrangements, which retain the almost everywhere convergence property to 
fo by van der Vaart (1998), p. 305. □ 

Proof of Part 2. Consider the step functions, as defined in the proof of Part 1. By setting 
r sufficiently large, we can take them to satisfy the following hypotheses: there exist regions 
Xq and Xq, each of measure greater than 5 > 0, such that for all x £ Xq and x' £ Xq, we 
have that (i) x' > x, (ii) f(x) > f(x') + e, and (iii) fo(x') > fo(x) + e, for e > specified 
in the proposition. For any strictly submodular function L : M? — > M + we have that r] = 
m.i{L(v' , t) + L(v, t') — L(y, t) — L(v' , t')} > 0, where the infimum is taken over all v, v' , t, t' in 
the set K such that v' > v + e and t' > t + e. We can begin sorting by exchanging an element 
f(x), x £ Xq, of r-vector / with an element f(x'), x' £ Xq, of r-vector /. This induces a 
sorting gain of at least r\ times 1/r. The total mass of points that can be sorted in this way is 
at least 5. We then proceed to sort all of these points in this way, and then continue with the 
sorting of other points. After the sorting is completed, the total gain from sorting is at least 
Srj. That is, j x L{f*(x), f (x)}dx < j x L{f(x), f (x)}dx - Srj. 

We then extend this inequality to the general measurable functions exactly as in the proof 
of Part 1. □ 

Proof of Proposition 2. Proof of Part 1. We prove the claim by induction. It is true for 
d = 1 by /* being a quantile function. Suppose the claim is true in d — 1 > 1 dimensions. If so, 
then x-j i— > f(xj,x_j), obtained from the original estimate / after applying the rearrangement 
to all arguments x_j of x, except for the argument Xj, must be weakly increasing in x-j for 
each Xj. Thus, for any x'_- > x_j and Xj ~ U[0, 1], we have 

./•(.V ; ..r' /; - ./•(.¥,.,• ,;. (A.2) 

Therefore, the random variable on the left of (|A.2p dominates the random variable on the right 
of (|A.2|) in the stochastic sense. Therefore, the quantile function of the random variable on the 
left dominates the quantile function of the random variable on the right, namely fj{xj,x'_-) > 
fj(xj,x-j) for each Xj £ X = [0, 1]. Moreover, for each x-j, the function Xj \— ► fj(xj,x_j) is 
weakly increasing by virtue of being a quantile function. We conclude therefore that x *— > fj(x) 
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is weakly increasing in all of its arguments at all points x G X d . The claim of Part 1 of the 
Proposition now follows by induction. □ 

Proof of Part 2 (a). By Proposition 1, we have that for each x—j, 

fj(xj,x-j) - fo(xj,x^j)\ p dxj < / \f(xj,x-j) - f (xj,x-j)\ p dxj. (A.3) 
x Jx 

Now, the claim follows by integrating with respect to x-j and taking the p-th root of both 

sides. For p = oo, the claim follows by taking the limit as p — > oo. □ 

Proof of Part 2 (b). We first apply the inequality of Part 2(a) to f(x) = fix), then to 
f(x) = R nd f(x), then to f(x) = R nd _ 1 R 7Td f(x), and so on. In doing so, we recursively generate 
a sequence of weak inequalities that imply the inequality (|2.4j> stated in the Proposition. □ 

Proof of Part 3 (a). For each x-j G X d ~ l \ X-j, by Part 2(a), we have the weak inequality 
(|A.3j) . and for each x-j G X-j, by the inequality for the univariate case stated in Proposition 
1 Part 2, we have the strong inequality 

f !M - foM \'^ < / \f M - Afe,,-,)^ - „A (A.4) 

X JX 

where rj p is defined in the same way as in Proposition 1. Integrating the weak inequality (|A.3p 
over x^j G X d ~ l \ X-j, of measure 1 — u, and the strong inequality (|A,4h over X-j, of measure 
v, we obtain 

/ \f*(x)-f (x)\ p dx< [ \f{x)-f (x)\ p dx-7 lp 8u. (A.5) 
Jx d JX d 

The claim now follows. □ 

Proof of Part 3 (b). As in Part 2(a), we can recursively obtain a sequence of weak inequalities 
describing the improvements in estimation error from rearranging sequentially with respect to 
the individual arguments. Moreover, at least one of the inequalities can be strengthened to 
be of the form stated in (|A.5p . from the assumption of the claim. The resulting system of 
inequalities yields the inequality (|2.5p . stated in the proposition. □ 

Proof of Part 4. This part follows from homogeneity and subadditivity of the L p norm. □ 

Proof of Proposition 3. Proof of Part 1. The monotonicity follows from Proposition 2. The 
rest of the proof relies on establishing the order-preserving property of the 7r-rearrangement 
operator: for any measurable functions g, m : X d — > R, we have that g(x) < m(x) for all 
x G X d implies g*(x) < m*(x) for all x G X d . Given the property we have that £(x) < f(x) < 
u(x) for all x G X d implies £*(x) < f*(x) < u*(x) for all x G X d , which verifies the claim 
of the first part. The claim also extends to the average multivariate rearrangement, since 
averaging preserves the order-preserving property. 

It remains to establish the order-preserving property for tt— rearrangement, which we do 
by induction. We first note that in the univariate case, when d = 1, order preservation 
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is obvious from the rearrangement being a quantile function: the random variable m(X), 
where X ~ J7[0, 1], dominates the random variable g(X) in the stochastic sense, hence the 
quantile function m*{x) of m{X) must be weakly greater than the quantile function g*(x) of 
g{X) for each x £ X. We then extend this to the multivariate case by induction: Suppose 
the order-preserving property is true for any d — 1 > 1. If so, then, for each Xj € X and 
x_j G X d ~ l , g(xj,X-j) < m(xj,X-j) implies g (xj, x-j) < fh(xj,X-j), where g and m are 
multivariate rearrangements of X-j i— > g(xj,x^j) and x-j i— > m{xj,x_j) with respect to X-j, 
holding Xj fixed. Now apply the order-preserving property of the univariate rearrangement to 
the univariate functions j) and Xj I— > m(xj,X-j), holding x_j fixed, for each x„j, 

to conclude that the order-preserving property holds for dimension d. □ 

Proof of Part 2. As stated in the text, the weak inequality follows from Lorentz (1953). For 
completeness we only briefly note that the proof follows similarly to the proof of Proposition 
1. Indeed, we can start with step functions I and u and work with their equivalent vector 
representations i and u. Then we apply the sorting operator S2 to the pair of r- vectors (£, u) 
defined as S2(£,u) = (S£, S'u), where S is the sorting operator on the vector £ defined in the 
proof of Proposition 1, and S' is a subordinated sorting operator on the vector u defined by two 
conditions: (1) if S exchanges the k-th and m-th elements of £, where m > k, then, if Uk > u m , 
S' also exchanges the k-th and m-th elements of u , and if < u m , S' leaves all elements 
of u unchanged; and (2) if S exchanges no elements of £, i.e., S£ = £, then S' is simply the 
unrestricted S operator as defined in the proof of Proposition 1, i.e., S' = S. By the definition of 
submodularity (|2.3h . each application of S2 weakly reduces submodular discrepancies between 
vectors, so that the pairs of vectors in the sequence {(£, u), S2(£, u), . . . , S2 ■ ■ ■ S2 (£, u), (£* ,u*)} 
become progressively weakly closer to each other, and the sequence can be taken to be finite, 
where the last pair is the rearrangement (£*,u*) of vectors (£,u). The inequality extends to 
general bounded measurable functions by passing to the limit using a similar argument to the 
proof of Proposition 1. The extension of the proof to the multivariate case follows by induction 
on the dimension, as in the proof of Proposition 2. □ 

Proof of Part 3. Finally, the proof of strict inequality in the univariate case is similar to 
the proof of Proposition 2, using the fact that for strictly submodular functions L : M. 2 1— > M + 
we have that 77 = mf{L(v',t) + L(v,t') — L(v,t) — L(v',t')} > 0, where the infimum is taken 
over all v, v' , t, t' in the set K such that v' > v + e and t' > t + e or such that v > v' + e and 
t >t' + e. The extension of the strict inequality to the multivariate case follows exactly as in 
the proof of Proposition 2. □ 

Appendix B. Design of the Monte-Carlo experiment 

The outcome variable Y equals a location function plus a disturbance e, Y = Z{X)' (3 + e, 
and the disturbance is independent of the regressor X. The vector Z{X) includes a constant 
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and a piecewise linear transformation of the regressor X with three changes of slope, namely 
Z(X) = (1,X,1{X > 5}(X-5),1{X > 10}(X-W),1{X > 15}(X-15)). This design implies 
the conditional expectation function E[Y \ X] = Z(X)'j3, and the conditional quantile function 
Qy[u I X] = Z(X)'(3 + Q e {u). We select the parameters of the design to match the growth 
charts example. Thus, we set the parameter (3 equal to the ordinary least squares estimate 
obtained in the growth chart data, namely (71.25, 8.13, —2.72, 1.78, —6.43). This parameter 
value and the location specification imply a model for the conditional expectation function 
and quantile process that is monotone for ages 2-20. To generate the values of the dependent 
variable, we draw disturbances from a normal distribution whose mean and variance match 
those of the estimated residuals, e = Y — Z(X)'j3. We fix the values of the regressor X to be the 
observed values of age in the data. In each replication, we estimate the target functions using 
the nonparametric methods described in Section 4.1. The total number of replications is 1000. 
All computations were carried out using the software R (R Development Core Team 2008), 
the quantile regression package quantreg, and the functional data analysis package fda. The 
rearrangement method developed in this paper is available in the package rearrangement for 
R. 
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